[pull] master from DataDog:master#535
Merged
Merged
Conversation
* [Postgres] Add column statistics collection Add a new column stats collector that queries pg_stats via the datadog.column_stats() function and submits column-level statistics (avg_width, n_distinct, null_frac) to the dbm-column-stats pipeline. Key features: - Streams results via cursor with configurable statement_timeout - Chunks payloads at 5,000 columns to limit memory footprint - Flushes at database boundaries to prevent cross-database accumulation - Multi-database support via autodiscovery with per-database error isolation - Health events for missing function and insufficient privileges - Recovery detection when errors resolve - Table include/exclude filtering pushed into SQL - APM tracing via @tracked_method * [Postgres] Move database_monitoring_column_stats to PostgreSql Column statistics is currently a postgres-only feature, so the EvP submission helper does not belong on the shared DatabaseCheck base. Define it on the PostgreSql class instead. If MySQL or SQL Server later add column-stats collectors, the method can be promoted up to db.py without caller-side changes. This also avoids cross-package release sequencing: a postgres PR importing a new datadog_checks_base API would have to wait on a base release before merging. * [Postgres] Add regression tests for schema collector include filters Existing schemas tests covered the exclude_databases / exclude_schemas / exclude_tables paths but never include_*. Add coverage for the include paths and for include + exclude combinations across all three filter levels. These tests serve as a baseline before refactoring the filter-clause construction into a shared helper used by both the schemas and column_stats collectors. * [Postgres] Share filter clause helpers between schemas and column_stats Extract two small helpers — regex_exclude_clauses and regex_include_clause — into a new filters module, and use them from both the schemas and column_stats collectors. The schemas methods (_get_schemas_query, _get_tables_query) keep the same shape and behavior; column_stats's _build_filters now composes both schema and table filters via the same helpers. While we are here, bring column_stats's filter set up to parity with schemas by adding include_databases / exclude_databases (applied in Python against the autodiscovered list) and include_schemas / exclude_schemas (applied in SQL via the shared helpers). All four are empty by default. Coverage: - Unit tests for the helpers in test_filters.py (12 cases). - Integration tests in test_column_stats.py for include_schemas, exclude_schemas, include_and_exclude_schemas, include_databases, exclude_databases, include_and_exclude_databases. - test_config_defaults updated for the four new knobs. * [Postgres] Default collect_column_stats.enabled to false Customers must opt in by setting collect_column_stats.enabled: true in their instance config. Matches the default-off posture used by collect_schemas. * [Postgres] Update test_config feature checks for default-off column stats test_initialize_features_enabled_and_disabled relied on collect_column_stats defaulting to true; now that it defaults to false, enable it explicitly alongside the other features. Also assert it is disabled in the disabled-by-default test. * [Postgres] Parameterize regex include/exclude filters Address review feedback to switch from f-string interpolation to psycopg parameterized queries for the regex-based include/exclude filters. Helpers in filters.py now produce only `%s` placeholders, and the collectors thread the pattern values through cursor.execute(query, params). Mirrors the pattern already used by schemas._get_databases(). Changes: - filters.py: helpers emit `%s` placeholders; pattern values no longer appear in the SQL string. - column_stats._build_filters returns (sql, params); _collect_for_database passes params to cursor.execute. - schemas._get_schemas_query / _get_tables_query / get_rows_query now return (sql, params); _get_cursor unpacks; _get_databases refactored to use the helpers. - SCHEMA_QUERY: escape literal % as %% in LIKE clauses now that the query goes through parameterized execute. - test_filters: assertions now check placeholder shape, not pattern values. - test_column_stats: split special-chars test into two clearer tests (one for valid patterns containing quotes, one for invalid regex). * [Postgres] Rename column_stats to column_statistics throughout Use the full word 'column_statistics' instead of the abbreviated 'column_stats' for the new collector. Renames symbols, config keys, file names, metric names, the SECURITY DEFINER function, the EvP track type, and the dbm_type payload field. Companion agent forwarder PR and backend dbm-metrics-intake registration will be updated to match in a follow-up. * [Postgres] Column statistics: tighten defaults, GCD scheduling, add diagnose probe * [Postgres] Release column statistics collector on cancel * [Postgres] Collect inherited, correlation, most_common_freqs in column statistics * [Postgres] Capture column statistics collector before cancel in default-config test * [Postgres] Address column statistics review feedback * [Postgres] Drop column_statistics.max_query_duration config
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )